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Abstract We present an algorithm to generate application-specific, global reduced order quadratures (ROQ) for mul- 
tiple fast evaluations of weighted inner products between parameterized functions. If a reduced basis (RB) or any other 
projection-based model reduction technique is applied, the dimensionality of integrands is reduced dramatically; how- 
ever, the cost of evaluating the reduced integrals still scales as the size of the original problem. In contrast, using discrete 
empirical interpolation (DEIM) points as ROQ nodes leads to a computational cost which depends linearly on the di- 
mension of the reduced space. Generation of a reduced basis via a greedy procedure requires a training set, which for 
products of functions can be very large. Since this direct approach can be impractical in many applications, we propose 
instead a two-step greedy targeted towards approximation of such products. We present numerical experiments demon- 
strating the accuracy and the efficiency of the two-step approach. The presented ROQ are expected to display very fast 
convergence whenever there is regularity with respect to parameter variation. We find that for the particular application 
here considered, one driven by gravitational wave physics, the two-step approach speeds up the offline computations 
to build the ROQ by more than two orders of magnitude. Furthermore, the resulting ROQ rule converges exponentially 
with the number of nodes, and a factor of ~ 50 savings, without loss of accuracy, is observed in evaluations of inner 
products when ROQ are used as a downsampling strategy for equidistant samples using the trapezoidal rule. While the 
primary focus of this paper is on quadrature rules for inner products of parameterized functions, our method can be 
easily adapted to integrations of single parameterized functions, and some examples of this type are also considered. 



1 Introduction 

Many application areas deal with parameterized problems. Throughout this paper, we consider these to be a set of 
functions 

f w ■■= {h^ ■■ n ^> c \ n e r^ft e c (n)}, 
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where SI, V denote the physical and parameter domains, respectively, C is the set of complex numbers, C (SI) is the 
set of continuous functions on compact domain ficl and F w C Hw denotes a compact subset of the Hilbert space 
%W := L W (S1). In general, both SI and V are multi-dimensional and can be irregular domains. Here we assume V to 
be compact in M. N and take the scalar product and norm between two functions f,g€ H\\r t0 be 

(f,g) Llv := I f(x)g(x)W(x)dx, ||/||^ := (fJ} L ^, 0) 

with < W G C (SI) some weight function and /* denoting complex conjugation. We absorb the weight above into 
each integrand function througrrl 

/(*)-> Wi(x)f(x), 

with x € SI, and define the set of functions 

T = {h^VW eC(n) I hf, e F w } , 

which is now a compact subset of the Hilbert space H := L 2 (S2). This allows us, without loss of generality, to use the 
standard I? inner product and the corresponding norm for elements of H, i.e., 

</,<?>= / f(x)g(x)dx, ll/f :={/,/). (2) 

Furthermore, the discrete I? inner product and the corresponding norm are here denoted by 

M 

{f,g)& = S ^u k f*(x k )g{x k ), H/lld = </,/>d , (3) 
fe=i 

where {xk^k}^=i &m arbitrary quadrature points and weights. Additionally, we often denote discrete objects and 
vectors through bold notation, for example, f = (f(xi), . . . , f(xj^)J and u> = (ui, . . . ,um) . More details about 
the notation used throughout this paper are given in Appendix|A] We assume exactness between continuous and discrete 
inner products within machine precision. 

There are several challenges associated with parameterized problems and Reduced Order Modeling (ROM). One 
of them, when the set of functions T is not known a priori but requires expensive numerical simulations, is how to 
select "on the fly" which parameters to solve for in a nearly optimal way, and a compact -application-specific spectral- 
representation for every element in T. Reduced Basis (RB) [1] is a leading candidate for such problems. We refer to 
Section [3~T| for details about it; for the purposes of this paper it suffices to describe in that section the algorithm and 
some of its key features. Another often desired aspect of any approach to parameterized problems is, once a reduced 
basis has been built, the ability to accelerate the computation of a particular quantity of interest. This can be fast online 
prediction of new solutions by solving a reduced problem (TJ or, as in the case of this paper, fast online evaluations of 
inner products between elements of T . Additional benefits may be gained from an interpolation-based approximation. 
The leading candidates for this are the Empirical Interpolation Method (EIM) |4 5| and its discrete counterpart the 
Discrete Empirical Interpolation Method (DEIM) [2,3]. Again, for the purposes of this paper it suffices to summarize 
the algorithm in Section [3~2] and some of its properties. 

The specific goal of this paper is to build application-specific global quadratures for inner products of functions 
(such as those appearing in convolutions, matched filtering, Markov Chain Monte Carlo simulations, etc.) combining 
both ROM and the DEIM, which for briefness we refer to as reduced order quadratures (ROQ). Here we explicitly 
consider RB for ROM, but the approach remains the same for other choices of bases generated using, for example, 
Proper Orthogonal or Singular Value Decompositions (POD, SVD) [6|. 

The construction of ROQ is somewhat similar in spirit to that one of Gaussian quadratures: the function to be 
integrated is approximated by a truncated expansion in a reduced basis, this expansion is replaced by interpolation (here 
DEIM) at special nodal points and, finally, the interpolant integrated to compute effective quadrature weights for any 
scalar product of functions in the set of interest T . For many applications of interest, there are several advantages of 
ROQ with respect to Gaussian quadratures, though. First, regularity with respect to parameter variation, not with respect 
to the integration variable, is exploited to provide very fast (in many cases of interest, exponentially fast) convergence. 

1 For generic target applications we envision these functions not to be polynomials, and the weights not to be standard ones. In particular, 
our proposed reduced order quadrature construction does not rely on classical orthogonal polynomial theory for which the precise form of the 
weight W is essential. 
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For example, for the gravitational wave test cases considered in this paper, ROQ prove to be more efficient than Gaussian 
quadratures (see Fig. [8}. In addition, the method is adaptive by nature, and the quadrature nodes are hierarchical. Our 
approach also gives, when acquiring data from experiments or observations, a nearly optimal way of downsampling it 
for matched filtering [ 7 , 8 ] or other purposes, while at the same time preserving the accuracy in computing integrals with 
the full data set. All these benefits of ROQ come at the cost of a potentially expensive offline procedure to pre-assemble 
both the basis and the ROQ rule. For many applications where multiple fast evaluations are required, especially if they 
are needed online or in real time, this tradeoff is indeed desirable. 

To the best of our knowledge, leveraging the advantages of the Empirical Interpolation Method for fast numerical 
integration was first suggested by Maday et al [4| (see also Refs. [9, 10|) and further investigations were carried out by 
Aanonsen [10|. However, our approach differs from previous ones in several respects. First, the empirical interpolant 
coefficients are usually found by carrying out a potentially costly (compared to a competitive quadrature rule) matrix- 
vector multiplication. Here we absorb this cost into an offline computation of the reduced order quadrature weights and 
explicitly write the ROQ rule as a vector-vector product (see Eq. (jTOj) in Section [2]l. Additionally, the RB-DEIM model 
reduction used here allows for a natural factorization of the parameter and physical dependences. Finally and most 
important, our main objective here is fast computation of inner products of the form , }. While this is closely 
related to the integration of single functions in T, for large problems there are practical and serious obstacles towards a 
reduced basis representation of products of functions, denoted here as 

? = i h U h H 6C(fl) | , h N G T} , (4) 



in terms of sizes. Most notably very large training spaces. To address this we suggest a simple two-step approach 
targeted towards such products which reuses the algorithms necessary for reducing the underlying lower order space 
T, with dramatic savings in the offline stage. Without this two-step procedure and/or adaptive sampling of the training 
space, building ROQ would be simply not possible in many realistic application problems driving our work, even if 
carried out offline and using large supercomputers. 

This paper is organized as follows. Section[2]provides an overview of the main ideas to be developed throughout the 
paper as well as brief preliminary discussions on quadrature rules, RB-DEIM, and reduced order quadratures. Section[3] 
collects necessary results and algorithms required for developing ROQ. In particular, in Section [3J| we describe a RB- 
greedy model reduction algorithm to generate a set of ba sis ve ctors which are subsequently interpolated at a set of points 



chosen by the DEIM algorithm as summarized in Section 3.2 The main difficulties in constructing a reduced basis for T . 



as well as our approach for overcoming them, are considered in Section [33] Fast integrations using the reduced model 
and ROQ are constructed in Section [4~T] along with some error estimates in Section |4~2| Straightforward extensions of 
the ROQ algorithm for resampled functions are discussed in Section [43] Two numerical experiments are documented 
in Section [5] To compare with well known results in one dimension, our first experiment considers polynomials on 
the standard interval [—1,1] as basis, and integration of single functions. The second example draws from a non-trivial 
problem in gravitational wave physics and showcases the potential savings of ROQ to compute scalar products in large 
problems. In Section[6]we summarize the results of this paper, and comment on their possible extensions. 



2 Overview and main ideas 

The main goal of this paper is to efficiently compute approximations to integrals such as those in Eq. (2). We refer to 
the exact, continuum one as I c , 

lS,j) ■= (h^,h H ) , (5) 

where {m, fij} are any two values in the parameter space P. Evaluation of this integral may be costly and for some 
applications must be repeatedly computed, perhaps in real-time; in such situations a Gaussian quadrature rule might 
be either limiting or unavailable (for example, if the integrands are not smooth enough or their value is not known at 
Gaussian nodes). Problems which depend on a high dimensional parameter or physical spaces constitute such cases. In 
addition, if the target functions {h /lj } come from acquired data, each function h /lj is usually sampled at equally spaced 
or scattered points and the integrations are usually carried out using an extended trapezoidal rule, with slow convergence 
in general. 

Next we discuss different approximations to continuum inner products I c such as those in Eq. {3). 

Integration by quadrature: Let us recall a typical quadrature rule, since the proposed ROQ of this paper follow a 
somewhat similar pattern. With {xfe, w/-} fe=1 denoting an arbitrary set of quadrature points and weights, respectively, 
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we can approximate the integral §5§ as 

M 

Ic{i,j) « h(ij) ■= {/i w ,Vi><i = ^u k h* H (x k ) h H (x k ) . (6) 

k=l 

Many integration rules can be written in the form of Eq. {6}, including the extended trapezoidal rule, Gaussian quadra- 
tures, and integration using domain decomposition. Among these many choices an optimal strategy should allow for a 
given target integration accuracy with the smallest number of operations. If the number of quadrature points M is not too 
large, one can evaluate |6]l with standard methods provided not too many integrals are required. However, if M is large 
(perhaps the functions are sharply peaked, have different length scales and/or many cycles) and/or many evaluations of 
I c are needed, computing the approximations Id can become very costly. 

Integration in a reduced space: A preliminary reduced order approximation to Id(i,j) would involve using an expan- 
sion in basis vectors {e^}™ =1 . Without loss of generality we can assume that the basis is orthonormal with respect to the 
discrete[^]inner product (•, •)&. Then any function hp £ T can be approximated by 

n 

1=1 

where we have introduced the orthogonal projection operator V n , and an approximation to (|6} would be 

n 

h(i,j) ~ (Pnh^^Vnh^a = y^(fe/x i; e fc )d(e fc ,fe Mj .) d . (7) 

k=l 



Now, computing the approximation (|7} requires knowledge of the projection coefficients {(fy^, e /s)d}. Modulo the 
latter, performing an integration via Eq. i|7} is of improved computational cost (n multiplications and n — 1 additions) 
whenever the number of basis elements is smaller than the number of quadrature points, n < M. A greedy construction 
of a reduced basis (see Sec. |3.1[ l, as opposed to a standard basis choice such as Jacobi polynomials, involves the use of 
a problem-dependent training space. If the functions to be integrated, namely h l _ li and , are members of this training 
space, the projection coefficients on the right hand side of Eq. l|7]l have been precomputed offline while building the 
basis (see Section 3.1 1. The more interesting case in practice is that one in which h iii and were not members of the 



training space, or one is unable to store the set of projection coefficients. 

Integration with reduced order quadratures (ROQ): We advocate an application-specific quadrature scheme for 
efficiently evaluating the discrete integral in f5b, with essentially no loss of accuracy, with a computational cost that 
depends only on the reduced basis space dimension, and without requiring projection coefficients. This new approach 
combines the flexibility of quadrature rules with powerful dimensionality reduction. We seek to approximate integrands 
of the form g^ = h*^. , which is where the main challenge lies; namely, in integrating products of target functions 
and not individual ones. Throughout this paper tildes will refer to approximations or quantities related to _F, defined in 
Eq. We are therefore seeking to approximate integrals of 

m 



with Qij 6 T, and a truncated expansion is carried with a set of basis vectors {e£}JL 1 approximating elements in T. 
As before, we are faced with computing or approximating the projection coefficients (eg, g^j )a, the cost of which can in 
principle be expensive. Discrete empirical interpolation (DEIM) offers an attractive alternative: consider these m basis 
{ze}i=i< l et {Pt}e=l C { s fc}fc=i be a set of points (the generation of which will be described later), and let the DEIM 
interpolant of gij be 



rn 

Zm[gij] ■= yZjei , 
t=i 



2 In carrying out the RB-greedy algorithm we resolve the integrals within machine precision, so for all practical purposes the discrete and 
continuum scalar products agree with each other. 
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where the interpolation coefficients {c^}^_j are so that Im[gij](Pk) = 9ij{'Pk)- This means that we enforce nodal 
exactness of I m and {ci}JL 1 solves the linear system 

m 

y]ctet(pk) = 9ij(Pk), k=l,...,m. (8) 
t=x 

If the interpolant is accurate then g^ w X m \gij\ and, substituting this into Eq.(|6}, our approximation to the quadrature 
rule {xk,uk}k=i becomes 



A I M 



k=l k=l 



M 

2J o; fe (3£(a; fc ) 



Q ■ (9) 



Written in this way, the bracketed expression in the last term of Eq. {9]l is seen to be the integration of the basis and acts 
as an effective quadrature weighting. Later on (Algorithm[4]and, in particular, Eq.<|26|>) we show that this expression can 
be rewritten in a more recognizable form 

rn m 

j) « /roq(«, j) ■= ^2^f Ki(Pt) h H(Pi) = gij(Pi) ■ (10) 

e-i t=i 

Eq. (jTOj is our proposed Reduced Order Quadrature, which has an online evaluation cost of 0(m). The interpolation 
points {pg}"^ C {^felfcLi are outputs of the DEIM algorithm. Both these points and the weights ojJ 10 ^ only depend on 
the underlying quadrature rule given by {x^, and on the choice of reduced basis vectors ef, they are precomputed 

off-line and do not depend on the integrand g^j. Note that the ROQ rule l |10| > achieves our efficiency requirements 
whenever m < M. In many cases one can expect exponential convergence of Iroq - ► ^d, and for I& ~ Ic we expect this 
convergence rate to be inherited by the limit to the continuum Jroq — > Ic with respect to m. Furthermore, even though 
this might be problem-dependent, in our numerical experiments (see Section [5T2j > we have found that m ~ 2n <C M 
(the equality m = 2n exactly holds for polynomial bases). For the case m = M, when all degrees of freedom have been 
exhausted, consistency requires both quadrature rules {p£,uf" }&Li and {xk, 0Jk}k=i to ^ e identical. Indeed, in this 
case the quadratures points trivially coincide and one can show (see Corollary [lb. 

In Sec. [3] we propose and analyze an algorithm for constructing the product basis {e£}™_ 1 as well as identifying 
the nearly optimal interpolation points {pi}"L 1 . A straightforward and largely self contained blueprint to generate the 
quadrature rule {pi, ^ OC ^}^L 1 is given in Algorithm^ 



3 Reduced basis and the discrete empirical interpolation method 

The ROQ rule proposed here requires a compact and accurate interpolation representation for all integrands in the 
space of interest. It has been shown that the EIM |4]|5) or its discrete counterpart, the DEIM (2][3), offers an attractive 
interpolation approach for reducing the complexity of a problem without sacrificing accuracy. In turn, the approach relies 
on knowledge of a problem-specific (ordered) basis such that any function of interest can be accurately represented by 
its I? projection onto the space spanned by these basis vectors. There are many ways to generate such a basis set, such 
as Proper Orthogonal/Singular Value Decomposition [6| and the Reduced Basis (RB)-greedy algorithm |1|. Here we 
favor a RB-greedy approach, since it is able to handle very large problems, such as those that we are interested in, but 
our proposal is directly applicable to other basis choices. 

In this section we describe both the RB-greedy and DEIM algorithms. Sec . [3T| introduces the RB-greedy approach to 



approximate the set of functions T or T, and the DEIM algorith m of S ec. 3.2 provides empirical interpolation points for 



these basis functions. Instead of directly approximating T, in Sec. 3.3 we propose an alternative, the two-step RB-greedy 
approach for large sets of products of functions. Without loss of generality we assume all functions to be normalized 
with respect to the discrete inner product (•, -)d. 



3.1 Greedy construction of a reduced basis 



The RB approach, combined with a greedy algorithm to generate the basis, provides a way to construct an application- 
specific expansion, where the basis elements are members of the space under consideration itself. Here the sets of interest 
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are T and T . For definiteness in notation we will describe the approach for T, since an approach for T or a ny o ther 



3.3 



space is identical; we will refer to this approach applied to T as a direct or one-step RB-greedy. Later in Sec. 
mention some of its drawbacks and our proposal to overcome them, via a two-step RB-greedy. 

A greedy approach is highly efficient in practice. It yields a nested basis set that is hierarchically constructed. As is 
customary for spectral expansions, having an orthonormal basis {eg }™ =1 simplifies computing the orthogonal projection 
Vnhfj, with respect to (■, ■)& of a function ftp £ T onto the span F n of the basis, 

F n := span{ei, . . . , e„} , 

namely, 



t=l 

We recall (see, for example, [11]) that the Kolmogorov n-width of T in H 

d n {T;H) = inf sup inf ll/i^ - /II , (11) 

dimX„<n h^jr feX n 

measures the error of the best n-dimensional subspace X n C H approximating T, and since H is a Hilbert space, 

inf \\hfj, - /II = \\hu - V n hu,\\ ■ 

Computation of the n-width d n , or any basis achieving it, is in most practical applications not possible (however, see 
1111 for some simple cases where it can be done). Nevertheless, obtaining a convergence rate for the n-width provides 
valuable information towards understanding the approximability of a parameterized space by an algorithm such as the 
greedy. 

We summarize a greedy strategy to build a reduced basis {e£}™ =1 , with an approximation error 

a n (T;H) := sup ||/i M - PnV|| i 

which is nearly optimal with respect to the Kolmogorov n-width defined in Eq. i |l Then we will state some of 
the available convergence rates for the greedy error a n (J 7 ;'H) based on 11211131 . which make precise the quoted near 
optimality, under the assumption that up to machine precision 

|| V — t-Vi^mII — || V — ^"Vlld • 

While the greedy error l |12| > has been defined over T, in practice one discretely samples the continuum using a 
training set Tk '■= {m}^=i C V of size K and a training space Strain of associated normalized functions {/i/ij}|=i. 
Typically, if there is redundancy, the number of reduced basis needed to represent the training space is much smaller 
than the number of samples: n <C K. 

Next let e > be a user-specified error tolerance. Its role is to guarantee that the approximation error ensured by the 
RB-Greedy algorithm is strictly bounded as 

fn (Strain! H) ■= sup \\hfj, - Vnhy, || d < e . (13) 

To a-priori ensure that the training space J^rain is a faithful approximation of the continuum T is, in general, difficult. 
Construction and adaptive management of the training space Strain is an area of active research; see, for example, 1141 
l9lll5|[T6ll . A good choice is dictated by the problem, further details for our application are given in the numerical results 
Section |5.2| To check that the reduced basis space F n is a faithful approximation of T we typically do convergence 
tests with respect to the number of samples K in Tk, and Monte Carlo reconstruction studies of the continuum; see for 
example 1 17, 18 19] and Section [5^2] For any given tolerance e > 0, Algorithm [T] ensures the strict bound over the 
entire training space. However, a-posteriori validation has allowed us to establish a more impressive bound in all of our 
applications in gravitational wave physics so far 117lll8lfT9l . 

Wh^-Pnh^U^ a n (T;H)<e Vu G V . (14) 

Such observation is obviously problem-dependent, and in general it should read a n (T\ H) < e where in principle I > e. 
For any e > and training set Tk me greedy algorithm to build a reduced basis is as follows: 
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Algorithm 1 (RB Greedy Algorithm) 

[{ee}?=x,{M}i=i\ = RB-Greedy (e, T K ) 

(1) Set n = and define <jq (Strain ; H) '■= 1 

(2) Choose an arbitrary ui € Tk and set ei := Comment: Hfy^ ||d = 1 

(3) do, while a„ (J" tra i n ; > e 

(a) n = n + 1 

(b) a n {hn) ■= \\hf, - V n hp\\ A for all /_t e Tr- 

(C) <T„ (J^rain ;H) = SUp^g^ {<?n(V)} 

(d) /j„ + i := argsup MerA . {<7n(M} (greedy sweep) 

(e) e„+i := h fln+1 — V n h t _ ln+1 (Gram-Schmidt Orthogonalization) 

(f) e„+i := e„ + i/||e n+ i|| d (normalization) 



The above algorithm returns a nested, hierarchical set of n greedy points {/^}™ =1 and orthonormal reduced basis 
{_ e iYl=\ which are nearly optimal with respect to the n-width \\\\ , in the following sense. Recall from [13 Corol- 
lary 3.3] that if the n-width defined in \\\\ decays exponentially, 

d n (T;U) < Ce- Can " , (15) 

where C, co, and a are positive constants, then the greedy error in l |12[ l decays as 

On(F\H) < V2Ce- Cina , (16) 

with ci := 2~ 1 ~ 2a co. Similar statements can be made if the Kolmogorov n-width decays with a polynomial order i.e., 
if the n-width in < | 1 1 1 ) decays as 

d n {T;U) <C a n~ p , (17) 

where C a , /3 are positive constants, then 

a n {T;H) <2 5P+1 C a n- f) . (18) 



In Section [5T2| we summarize the observed exponential decay of the greedy error for a particular family of smoothly 
parameterized gravitational wave solutions; see [ 20 ,18,1719] for further studies. 

We have also found that for any given representation error and any parameter domain Tk we can reconstruct any 
member function in T, not necessarily in the training space used to build the basis, with a finite number of basis elements. 
That is, as the density of Tk increases, a finite asymptotic number of basis is needed to represent it; see, for example, 
[ 18 1, and |2T| for a similar result using RB and SVD respectively. 

Remark 1 While it is obvious that d n (T; H) < o- n (T; H), the decay rate of the Kolmogorov n-width gives an estimate 
for the greedy error decay according to ( |16[ l (exponential) or (jT8j) (algebraic). A direct comparison obtained in 1131 
Corollary 3.3] reads 



°n {T-n) < ^2d n/2 (T-n), 

for any n-width decay rate. If d n / 2 (T; H) decays as in (jT5j, then one gets \16\ . Similarly, if d n / 2 (J 7 ', %) decays as <jT7j) 
then, one gets <|l 8fr. 



We now turn to some immediate applications of the reduced space expressed with matrix-vector notation, which 
will prove convenient throughout the remainder of this paper. Let 

V = [e 1 ,...,e„]GC Mx ", 

where the £ th column of the matrix V is = (e^(xi), . . . , ei(xM)) T G C M i.e., it corresponds to the £ th reduced 
basis eg sampled at the set of quadrature points {xi}f!L 1 used in the greedy algorithm^] In this matrix- vector notation 
the expansion coefficients c e (y) = (e^, /i^)d for Vnh^ = Yle=i c i{i J ) e i now read 

c = V t ( U oh„) 1 



For certain applications it may be desirable to evaluate the functions at a different set of points {j/;}*^, see Section 



4.3 



for details. 
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where c = (ci, . . . , c n ) T G C™, denotes the conjugate transpose of V, and o denotes the standard Hadamard (Schur) 
or pointwise product between the vectors (in this case cl> and h M ) components. Therefore the reduced discrete integral 
^ becomes 

n 
k=l 

= (v t ( W oh (1 ,)) T (v t ( U oh ft )). (19) 

Here we have reduced the overall dimensionality of the problem to n, but two issues remain. First, the number of 
{hfj,(xk)}if = i functional evaluations depends on M. Second, the dominant operation count for the approximation is 
seen to be of order O(Mn). We will return to these issues shortly. 



3.2 The discrete empirical interpolation method (DEIM) 

Recall from Sec. |3.1| that an approximation by projection, such as Vnh^, requires knowledge of projection coefficients, 
the computation or approximation of which relies on potentially expensive quadrature rules. The main idea behind the 
DEIM is to replace a potentially expensive approximation of projection coefficients by a relatively inexpensive interpo- 
lation, without sacrificing accuracy. The DEIM algorithm identifies a set of basis-specific interpolation points through 
a greedy selection criteria. When applied to several physical dimensions, irregular shaped domains, and rather generic 
parameterized spaces, the DEIM |2][3) and EIM 1411101191 points have shown remarkable robustness and efficiency. 

Suppose we have a set of basis vect ors. In particular, the algorithm applies to the reduced basis for the space F n or 



the space F m to be constructed in Sec. 3.3 The algorithm selects the DEIM points in physical space Q and builds an 



associated interpolation matrix P. This matrix P interpolates the columns of the matrix V introduced in Sec. |3.1| 



Algorithm 2 (Selection of DEIM Points) 

[P, {pi}™ =1 ] = DEIM (V, {£fc}fc=i) Comment: the column vectors of V must be linearly independent 

(1) j = argmax |ei| Comment: here argmax takes a vector and returns the index of its largest entry 

(2) Set U = [ei], P = [ej], pi = Xj Comment: ej is a unit column vector with a single unit entry at index j 

(3) for i = 2, . . . ,n do 

(4) Solve (P T U)c = P T ei for c 

(5) r = e, - Uc 

(6) j = argmax |r 

(7) Set U = [U r], P = [P e,-], p % = Xj 

The DEIM algorithm described above is nearly identical to the one given in Ref. [2] with the exception of Step 7 
which is U = [U e»] in that reference. In Appendix[C]we show these algorithms to be equivalent and, furthermore, 
we show a relationship between DEIM and Gauss-elimination with partial pivoting. Owing to the lower triangular form 
of P T U, in Appendix[5] we show that Algorithm[2]has a reduced computational cost. 

Remark 2 The columns of P are unit vectors with a single unit entry at the location of the empirical interpolation points 
and zero elsewhere. For example, the first column of P has a unit entry at the location argmax |ei | . Furthermore, P T h.^ 
extracts an n-subvector which is exactly equivalent to evaluating at n empirical interpolation points {pi}f =1 . 

With empirical interpolation points {pi}f =1 we are now in a position to present an explicit and practical expression 
for the interpolant. Consider the approximation 

n 

(h^xx),. . . ,h^{x M )) = hfj, ^^2ci(fi)ei = Vc(/i) , 

i=l 

where c(/x) are unknowns to be found via interpolation. Multiply both sides of this expression by P T to deduce that the 
interpolation step described in Eq. l[8| is equivalent to 



P T h M = P T Vc( M ) , 
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which can be solved by (note that P T V is invertible when the columns of V are linearly independent (2) ) 

c(a.) = (P T V)~ 1 P T h Al . 
Hence, an explicit form for the DEIM approximation is 

h M w 2„[h M ] := V(P T V) _1 P T h M , (20) 

which is indeed an interpolant: 

P T X„[h M ] = P T (V(P T V)~ 1 P T h M ) = P T h M . 

That is, the interpolant agrees with the original function at the interpolating points, as it should. 
Substituting Xrjh^] from Eq. ( |20| i into Eq. §6§ gives 

I d (i,j) « {l n [h^},X n [h H ]} d « ((P T V)" 1 P T h w ) T ((P T V)- 1 P T h Mj ) . (21) 

As we only have to evaluate P T h M £ C™, with n < M in many applications, we have reduced the complexity (that 
is, the number of required functional evaluations) to 0{n) (compare with \\9)). However, due to the two matrix-vector 
multiplications an evaluation cost of 0(n 2 ) has appeared. For many (if not all) interesting problems one should expect 
0(n 2 ) > O(M). Such unacceptable scaling arises when approximating the individual functions instead of the 
products (integrands) h*^ i directly. We consider the generation of a reduced basis for products of functions next. 



3.3 Two-step greedy approach for the approximation of product of target functions 



, Path #1 . 

/■ ► \ 

f >F n 2 ► F m (22) 

Path #2 (i) Path #2 (ii) 



As discussed at the end of Sees. 3.1 and 3.2 we seek to approximate all possible integrands = hj li h^ lj g T, 
where T is a subset of the Hilbert space T-L = L 2 (fi). A natural approach, which we refer to as a direct one, is to build 
a reduced basis for T t hrou gh a gre edy algorithm as described in Algorithm [I] and is marked as Path #1 in l |22| >. In that 



case the results of Sees. 



3.1 



and 



3.2 



'. are directly applicable, with T and F n replaced by T and F m respectively. However, 
if a faithful approximation of J" requires a training set Tk of size O(K) then an appropriate training set for a faithful 
representation of T would in principle be of size 0(K 2 ). Since for large problems O(K) can already be computationally 
challenging, this direct approach might be unfeasible in some applications, even when building the basis is an offline 
and parallelizable calculation. In Section [5T2| we argue that this direct approach can be impractical. Advanced algorithms 
for sampling or building the training space, perhaps adaptively [14,9,15,161, might overcome this issue, though, and 
allow for a direct approach to such large problems. 

Here we advocate what we call the two-step greedy approach via Path #2 in ( [2"2"| l. The basic idea is to take advantage 
of the observation that all integrands can be accurately approximated by 

fn \*/n \ 

which involves a sum of n 2 terms featuring products of functions purely of the form e£e;. We then carry out a second 
layer of dimensional reduction - a second greedy for the products {e£e;}jj ;=1 selected by the first greedy. The training 
space for this second greedy is of size C(n 2 ), as opposed to 0(K 2 ) for the direct approach (with, usually, n 2 <g K 2 ). 

In more detail, our two-step greedy works as follows. In the first step {Path #2 (i) in ( |22[ >) we generate, through 
a first greedy as in Algorithm [T] a reduced basis -whose span we denote by F n - approximating T . Next one might 
construct a set F n 2, consisting of all n 2 products e* k ei/\\e* k ei ||d- Another option, and indeed the one used in our numerical 
experiments, is to use a training set T 2 '■= {{m, Mj')}?j'=i an( ^ associated normalized products h^h^ /Wh^h^ ||d to 
define F n 2, where {/ii}™ =1 are the greedy points identified by the first greedy algorithm when approximating T by F n . 
In other words, we do not necessarily use an orthonormal reduced basis for F n (see Remark[3] below). In either case, we 
approximate F n % through a second RB-greedy (Path 2 (ii) in l|22|l), carried out again as in Algorithm Q but with F n 2 
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as the training space. The result is an orthonormal set of m reduced basis {c^}™ 1 such that F m '■= span{ei, . . . , e m } 
accurately approximates F n i , and in turn the full set T . 

At the end of the day, either using a direct approach (Path #1) or our proposed two-step one (Path #2), given any 
integrand = h^h^ £ T there is an accurate reduced order approximation V m gij of the form 

m 

9ij w V m gij ■= ^2(ek,9ij)d efe . (23) 
fe=i 

For definiteness, we choose the tolerance for the second greedy step to be equal to the first one (e in Eq.l|13[>). To 
better explain the algorithm it will be useful to introduce new notation for elements of T n : define Jlk(i,j) := (MiiMj) 

2 

to be a re-indexing of a two dimensional array {(/Xj, fj,j)}f j =1 into a one dimensional array {Hk\k=i' wnere k is a. 
one-to-one function which takes two integers i and j and returns a unique integer k(i,j). 

Remark 3 The first greedy returns both the greedy points {Mi}™ =1 , an orthonormal basis {ej}^_ 1; and the greedy basis 
functions {/i^;}™ = i, serving as a non-orthonormal basis set for the same space F n . As discussed above, from either 
{^Mi}?=i or { e i}?=i one can build the set F n i which is approximated below by Algorithm[3] In our numerical exper- 
iments we choose {/i w }™ =1 as basis, partly because we need not store the orthonormal basis vectors {ej}" =1 but only 
the greedy points {/^}" =1 . Clearly the implementation and interpretation of Algorithm[5]will depend on this choice. For 
example, when using {hinYi=\ 11 makes sense to discuss the selected greedy parameter points ju/. while for {ei}™ =1 no 
such interpretation holds; instead one should view Step 5d as returning a column index when F n 2 is thought of as an 
M x n 2 matrix. 

To summarize, the second step of our proposed two-step greedy algorithm to build F m is 

Algorithm 3 (Two-step RB-Greedy approximation for F m ) 

[{et}?=i, = Two-step RB-Greedy (e, { W }? =1 , {ej? =1 ) 

Comment: {ej}" =1 are either orthonormal or greedy basis functions (cf. remark[3]l 

(1) Let7; 2 = {(Mi,^)}"j=a 

(2) LetF„ 2 ={eje i /||eje i || d }? j - =1 

(3) Set m = and define a (F n 2;H) := 1 

(4) Choose an arbitrary g £ F n i and set ei := g Comment: \\g\\& = 1 

(5) do, while <T m (F n z;H) > e 

(a) m = m + 1 

(b) a m (g) := \\g - V m g\\ for all g G F n 2 

II lid 

(c) o m {F n 2- H) = sup seF)i2 {(r m {g)} 

(d) Jim+i ■= argsup geFn2 {a m (g)} (greedy sweep) 

(e) e m+ i := gj, m+1 - Pm5)l m+1 (Gram-Schmidt) 

(f) e m+ 1 : = e m+ 1 / 1 1 e m+ 1 1 1 d (normalization) 



Paralleling the discussion in Sec. |3.1| and Sec. |3.2| a discrete form of the approximation l |23| l is accomplished by 
defining the matrix 

= [ei,... ,e m J £ C 

where the £ th column is = (e e (xi), . . . , e^(xM )) T ■ Then in matrix-vector notation P m g is 

Pmg = VV t («og), 

where we have dropped the indices i, j for brevity. Continuing, we build the interpolation matrix P from Alg. (|2li with 
V as input. The DEIM interpolant is found by mimicking the steps in Sec. 

g ~X m [g] := V(P T V)- 1 P T g. (24) 

A discrete integral rule stemming from the above interpolant leads to an efficient and simple reduced order quadrature 
rule l |10[ > and is the subject of the next Section. 



3.2 which produces X m [g] satisfying 
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4 Reduced order quadratures (ROQ) 

In classical theory of Gaussian quadratures one seeks to maximize the exactness for the integration of polynomials. Here 
the goal is to empirically maximize the accuracy of inner products between elements of T . Notice that an appro ximation 

we noted 



of (discrete) inner products may be carried out in either a reduced space F n ~ T or F m ~ T. In Sec. 
that working in F n leads to a pessimistic 0{n 2 ) cost for inner products (see the discussion below Eq. (21 1). Thus, for 



3.2 



applications where one needs fast evaluations of many scalar products the approximation space F m is preferable. For 
the construction of F m we refer to l|22|i. 
Section ■ 



4.1 



focuses on a ROQ stemming from F m , although only very minimal changes are needed to adapt Algo- 
rithm[4]for obvious variations. Indeed, one such variation is considered in our first numerical experiment. In Section [43 
we consider extensions of the ROQ construction to situations where one might only be able to evaluate integran ds at a 



4.2.1 



set of points which do not correspond to the nodes used in the quadrature rule {x^, cjk}^f =1 . In Section 

we present the DEIM interpolation error estimates, followed by some ROQ error estimates in Section [4.2.2| 

4.1 ROQ Algorithm 

Suppose we are given a set of functions T and an arbitrary quadrature rule {x k , ui k } k=1 for the discrete inner product l(6j. 
The following algorithm generates the ROQ nodes and weights. 

Algorithm 4 (Construction of Reduced Order Quadratures) 

[{pt,uf OCi }? =1 \ =ROQ(e, T K , {x k ,u; k }^ =1 ) 

(1) Approximation of T: We have two options to approximate T by F m , (cf. I|22|i). 



Path #1 Consider the training set T K = {(m, fij ) }fj=i an d a Pply the Greedy AlgorithmJIJ with obvious modifica- 
tions, to generate F m = span{e^}^ 1 and greedy points {Jj-^YI! =1 C T K such that any product h*^ i h ilj g T 
can be approximated by its projection V 

KViVf -^m (ftjttiVf) < e > V (fli,flj) G T K ■ 
II lid 

Note: This direct approach can be expensive; we therefore advocate the following two-step procedure. 

Path #2 (i) Approximation of _F: Consider the training set Tk = {fJ-ij^x and apply the Greedy Algorithm[I] 
to generate F n = span{e^}" =1 and greedy points T n = {^i }™ =1 C Tk such that any hp G T can 
be approximated by its projection V n as: 

\\hfi -'PnVlId ^ e ' V/i G Tk ■ 

Training set Tn'- Define the training set Tn = Tn x Tn = {(Pi, an d the set 

F n 2 = {e*ej/]|e*ej|| d }™ i=1 . 

Another choice for F n i is {h* lli h Uj /\\h* a .h Uj ||d}™^_ 1 > we refer to Remarkj3jfor justification. 

(ii) Apply the Greedy Algorithm [i] to generate F m = span-te^}^^ and {]li}J! =1 C Tn C T K such that 
any product e*ej G F n i can be approximated by its projection V m as: 

e*ej - V m (e*ej) <e. 



d 



(2) Define the projection matrix 

Vr~ ~ i pMx rn 

= [ei, . . . , e m \ G <L , 

where, for example, the £ th column is = (e^(xi), . . . , e^(xM )) T - 

(3) Generation of empirical interpolation points: Apply the DEIM Algorithm [2] with V as an input to compute the 
DEIM points {pi}" L =1 C {x^fLx and the interpolation matrix P. 
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(4) Compute the ROQ weights by 

T 



(u, ROQ ) := c^VCFv)- 1 . (25) 

Step 4 deserves a bit more explanation. With the interpolation matrix P from Step 3 we first build an expression 
for the interpolant of some product g £ T using Eq. l |24[ l. The reduced order weights are then found by substituting the 
DEIM interpolant into Eq. l[6} to produce the desired approximation with g = . o , 

m 

h{i,j) = u: T S « u> T l m [g] = [a; T V(P T V)- 1 ] P T g = ^ W f OQ 5 (?i) • (26) 



z=l 



Notice that the term P T g is just evaluation of g at the DEIM points {pi}™ x (which need not be ordered, but can be if 
desired). Furthermore, the weights are explicitly parameter independent and for less accuracy we might consider using 
the first m! < m functional evaluations of g 

ra 

i=i 

which, as this is equivalent to carrying out the computation in an m'-dimensional subspace of F m , is also a valid 
integration rule. In general the weights a ROC ^ associated with this mf point quadrature rule will not be a subset of the 
weights ^ RO< ^ computed for the m point rule. Notice that the hierarchical nature of the empirical interpolation set allows 
one to build application-specific nested quadratures of arbitrary order and depth (i.e. more than one embedded method). 
For example, suppose m = 20; one might combine the 10-point, 15-point, and 20-point ROQ rules to yield a nested 
quadrature scheme. 

Next we state some results which highlight the importance of using an accurate quadrature rule {x^ , Uk } fe£i> these 
will be verified in Section I5TT1 



Theorem 1 Integration of the basis functions {ei}^L 1 with either of the quadrature rules {xi, or {pg, w R 

yield identical results. 

Proof The integration of basis {ei}™ =1 using ROQ (cf. I |25| l, ( |26[ l in vector notation) is 

/roq = (u> ROQ ) T P T i, = o; T V(P T V)~ 1 P T i l , i = 1, . . . , m . 
Recalling V = [ei, . . . , e m ], integration of all the basis {e^}™ 1 can be written as: 

( W ROQ ) T P T V = u> T V, 

where u> T ~V is precisely the integration of all the basis vectors {ej}™ 1 with the quadrature rule {xk, u>)-}ff =1 . 

Corollary 1 (Consistency of ROQ rule) When m = M the quadrature rules {x.^uj^f^ and {P£,k> RO( ^}£Li are 
identical up-to ordering of the quadrature points. 

Proof Using Theorem[T] we have 

(^ roq ) T P t V = u> t V, 
apply V -1 on the right to deduce (u> ROQ \ P T = oj t . 



Remark 4 If the quadrature rule {xi,u>i}fL 1 integrates the basis functions {eg}JL 1 exactly then ROQ built using Algo- 
rithm[4]also integrates those basis functions exactly. 

Remark 5 As a special case consider .F to be a space of polynomials of degree m — 1 with weight W = 1 and suppose 
the basis is specified by an ordered set of Legendre polynomials. Then the m-point ROQ rule will exactly integrate any 
polynomial of at most degree m — 1 provided the quadrature rule {x^, u>],}ff = i is at least m — 1 exact. 
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4.2 Convergence Estimates 
4.2.1 DEIM Convergence 

We shall first state a DEIM interpolation error estimate with respect to the discrete L 00 -norm. Then we will recall the 
DEIM error estimate with respect to the discrete i 2 -norm from [2 | and, finally, combine this estimate with those for the 
RB-greedy basis construction from II 1 31 . 

Theorem 2 (discrete empirical interpolation method) Let the set of reduced basis {e£}™ =1 be orthonormal with 
respect to the discrete inner product defined in ([3} and Jiff' £ F n be the optimal approximation of h a with respect to 
the discrete L°° -norm. Then for every a £ V and x k , with k = 1, . . . , M, 

max \hij,(x k ) - T n [h ll ](x k )\ < (1 + A n ,oo) max \h n (x k ) - h° pt (x k ) , (27) 

l<A:<Af l<£;<Af I 

where A n . oo = flntoo = ^(P^V) -1 ? 71 !^ denotes the Lebesgue constant. 
In the case of the discrete L 2 -norm, hfff 1 = P n h a and we have for all a £ V 

||^-2n[A/*]|| d <^n,2||^ Al - , PnV|| d - ( 28 ) 
where A n .2 = |||Xn||| 2 = ||| V(P T V)- 1 P T ||| 2 [] 

Proof The proof of l |27| i and l |28| > are straightforward and are omitted. Furthermore, ( |28[ l follows from |2, Lemma 3.2], 
but we state it for completeness. Since I n [h'ff ,t ](x k ) = h'ff t (x k ) for k = 1, . . . , M, we get 

\\hf_t -I n [hn]\\ = (1 -T n ) [hfi - frjf*] 

II lid 

I lid 

where the last equality follows from the fact that ||| 1 — I n ||| 2 = \\I n ||| 2 (see, for example 12211231 ). 

Remark 6 Notice that the DEIM Algorithm [2] seeks to minimize the interpolation error as measured by a discrete L°°- 
norm (cf. Step 6, Algorithm]^, for which an error bound of the type ( |27| i is closely related. The RB-greedy Algorithm 
[TJ however, exactly minimizes the term||/i M —V n h u \\ d on the right hand side of the error bound l |28| > and is therefore 
computable. Observing that A n ,2 < |||V||| 2 |||(P T V) _1 ||| 2 (because |||P T ||| 2 = 1), one might consider directly mini- 
mizing the error||/i M — I n [/i/j.] 1 1 d by minimizing the norm |||(P T V)~ 1 ||| 2 . Indeed, whenever V^V = l nX n we have 
exactly A n> 2 = |||(P T V)~ 1 ||| 2 which is more efficient to compute than |||V(P T V) _1 P T ||| 2 . Throughout the numerical 
experiments section]?] in addition to the error bound ([28} we monitor 

\\h„-T n [hu}\\ d < |||V||| 2 |||(P T V)- 1 ||| 2 ||/ lA1 -Pnh^ , (29) 

which we have written here for future reference. 

Corollary 2 (RB-Greedy-DEIM error bounds) The following practical RB-greedy-DElM estimate holds 

\\hfx -Tn[h^]\\i < A n< 2e, V/i,, 6 J. 

Furthermore, if the Kolmogorov n-width decays exponentially (with order a) as in \15) , then the error ( |28| l decays with 
the same order, 

||^-2^Mlld £ V2CAn,2e- Cin " , 

whereas if the Kolmogorov n-width decays algebraically with order f3 as in j!7| l, then l |28| > again decays with the same 
order /3 

||V -X„[MIU < 2 5/3+1 CaAn,2n- p , 
where C a > and /3 > are some constants. 
Proof Consider < fT4] ) and (|28) and apply {T6) and ([18). 

4 See Appendix A for the definition of the 2— norm ||| A| 2 of an operator A. 
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4.2.2 ROQ Convergence 

Next we present the error estimate for the case in which the reduced basis {ei}J! =1 for F are generated through a direct 
approach via Path #1 in ( |22[ l. 

Theorem 3 (approximation error ROQ (direct approach)) Let e > be an error tolerance and V m ■ F — > F m be the 

projection operator defined in Algorithm [4] ( Path #1 ), with accuracy e > in the discrete L 2 -norm \ \ ■ \ | d . Furthermore, 
let T m : F — > F m be a DEIM operator built using Step (3) (RB-greedy-DElM) of Algorithm^ Given two arbitrary 
functions , h ihj £ F, the following estimate holds 

\h(i,j)-lKOQ(i,j)\ <e(||l|| d 1^112)11^^ ll d ■ 
Proof We recall that given , h flj £ F, with ^ , fij £ V, we have 

M 

fc=i 

On the other hand, we have the following expression for the ROQ integral 

M 
k=l 

An application of Cauchy-Schwarz implies 

M 

~ lROQ(hj)\ < ^2 \ Wk ( h U( x k) h H( x k) ~ [fyu 4 Vj K 2 *)) | 
fc=l 

< IlilU \\Ki h H ■ 
II lid 

Next the fact that X m \v m [h% h N ]] =V m [h* M h l _ lj ] , implies 
Finally using |||1 - X m \\\ 2 = \\%n\\ 2 (cf. 12211231), we get 

\h(i,j) -lROQ{i,j)\ <l|l|| d IPm||| 2 \\hU h H -Pm^m 

II lid 

< ||l|| d |||X m ||| 2 (Tm 

Then the greedy estimate Path #1 in Algorithm|4]gives the required estimate. 

Our numerical examples indicate the ROQ built from a two-step greedy (using Path #2) has an error bound as in 
Theorem[3] but proving such bound is beyond the scope of this paper. 

Remark 7 (FLOP-Count) Let K be the number of parameters in the training space to approximate F by F n . If m and 
m denote the number of reduced basis for direct {Path #1) and two-step greedy (Path #2) respectively, then numerically 
we observe that m « fh. The cost of ROQ Algorithm [4] using a direct and a two-step greedy is 0(K 2 Mm) and 
0(n 2 Mm + KMn), respectively. This implies that for K 3> n (as is observed in our numerical experiments), the 
two-step greedy is highly efficient. For the details of FLOP counts we refer to Appendix |B| 
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4.3 ROQ at new quadrature points 

We now consider the scenario in which the integrands are available at a set of points which is different from the ones 
used to build the reduced basis. A typical example of this would be the case in which one is allowed to choose the 
sampling points and quadrature rule to build the basis, for example in RB-greedy algorithm, but not the points at which 
the data is available for computing the ROQ. 

We continue denoting the set of points and quadrature rule used to build the basis as {xk,ujf.}ff =1 , and we denote 
those for which the online data are available as {yi,Ti}f£i. It is usually cheaper to generate F m = span{e^}"^ =1 and 
{Ue}lLi using {x^, cJk}kLi (for example using Step 1, 2 of Algorithm]^ than {yi, r^f^, if we can choose the former. 
Therefore, it is not unreasonable to assume, for simplicity, that M' > M. Our goal here is to generate an ROQ with 
respect to {yi, Ti}f£ 1 under this scenario. 

Algorithm 5 (Construction of ROQ for {yi, t^}*^) 
[{PI, rf OQ }r=i] = ROQ-NEW {vuTi}^) 

(1) Orthogonalize the (ordered) product of functions {gji^JLi using the quadrature rule {yi, Ti}f£ 1 and let the resulting 
vectors form the columns of a matrix V. That is, 

= [ei, . . . ,e m J £ <L , 

where, for example, the £ th column is = (ei(yi), ■ ■ ■ , ee{yM')) T ■ 

(2) Apply Steps 3 and 4 from Algorithmic generate the ROQ points {Pi}^! =1 C {j/j}|£i an d the interpolation matrix 
P. Then the ROQ weights are given by 

(t ro Q) t := /^(Fv)- 1 . 

Accuracy and conditioning of the resulting ROQ rule are guaranteed provided (i) we are able to carry out orthogonaliza- 
tion in Step 1 up to a certain tolerance and (ii) the new quadrature rule is accurate enough such that | — P„ft p | < e. 

5 Numerical examples 

Here we discuss two sets of numerical experiments. As a first example we compare ROQ using orthogonal polynomials 
on the interval [—1,1] as basis (that is, the basis is not built through a greedy approach) with the well known case of 
Gaussian quadratures and, in particular, integration of Runge's function. This study reveals several important features: 
i) ROQ is well conditioned, ii) ROQ and Gaussian quadrature rules have a similar point and weight distribution, iii) the 
importance of an accurate quadrature rule {xi,ui (cf. Theorem[l}. 

Next we turn to a challenging example drawn from gravitational wave physics, namely the calculation of overlaps 
between "chirp" gravitational waves for compact binary coalescences. Details on implementation, error bounds, and a 
discussion of computational cost are provided, as well as the physical motivation of the problem. In particular, we find 
that ROQs exhibit exponential asymptotic convergence in the calculation of overlaps, with a profile of the error function 
in terms of quadrature points qualitatively similar to the greedy error curve of the training space in terms of the number 
of reduced basis elements. We emphasize that this exponential convergence in computing overlaps is, as expected, 
present even when the function is sampled at equally spaced quadrature points. In particular, this means that reduced 
order quadratures can be used as an efficient down-sampling strategy when function evaluations from experimental 
observations are given at (for example) fixed intervals. We also show the staggering offline savings of our two-step 
greedy approach compared to a direct one. 

5.1 Comparison with Gauss-Legendre quadratures 

In order to highlight the essential features of our approach, in this subsection we use Legendre polynomials {P£(z)}#Li 
defined on x £ [—1, 1] as an orthonormal basis (with respect to W(x) = 1) instead of a reduced basis generated in Step 
1 Algorithm]?] Steps 3 and 4 of Algorithm[4]require specifying an underlying quadrature rule, which in this subsection 
will be either an M point Gauss-Legendre or extended trapezoidal rule. The resulting ROQ point and weight distribution 
(see Subsection |5.1.1[ >, conditioning (see Subsection |5.1.2[ > and application to Runge's function (see Subsection |5.1.3[ l 
are considered next. 
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5.7.7 Point and weight distribution of ROQ 

To build the ROQ points and weights an extended trapezoidal rule is used and all m = 24 Legendre polynomials 
are sampled at M = 1, 000 equidistant points. Fig . 1 1 (a)| compares the distribution of points and weights for the Gauss- 
Legendre quadrature (red dots) and ROQ (blue crosses). Both weight and point distributions are similar, and in particular 
the points are not equally spaced but instead cluster towards the end points at x = ±1. One of the ROQ weights is equal 
to -0.00496089441576999 at 0.775775775775776. Negative weights can be problematic in quadrature rules due to 
roundoff errors leading to poorly conditioned formula [24|. We turn to this issue next. 



2.25 




number of Legendre polynomials 



(a) ROQ and Gauss-Legendre point and weight distributions. (b) ROQ (solid) and Gauss-Legendre (dashed) condition number. 

Fig. 1 The top left scatter plot shows the weight u)^ and point distributions for Gauss-Legendre and ROQ using m = 24 Legendre polynomials 
as a basis (see Subsection |5 .1.1 | for more details), whereas the bottom left one shows just the point locations. The right plot gives the condition 
number Y^—t l^fc I for an m-point Gauss-Legendre and ROQ rule, with m £ [2, 200] (see Subsection|5. 1 .2|for more details). 



5.7.2 Conditioning of ROQ 

Consider the absolute condition number \u k \ of a quadrature formula which, on the integration domain x £ [—1, 1], 
is exactly 2 if all the weights to k are positive as can be seen by integration of unity. If some weights are negative then 
the condition number is necessarily larger than 2. A classic example of ill-conditioning is the Newton-Cotes rule, where 
Sfc l w fel > ^ and grows without bound when using more than 8 nodes [24]. Fig. 1(b) shows the absolute condition 
number J2k l^fcl f° r a Gauss-Legendre (dashed red line) and reduced order (solid blue line) quadrature. The dashed red 
line shows the optimal value of 2, which is achieved by the Gauss-Legendre quadrature case. The condition number for 
ROQ remains below 2.25 for the first 200 Legendre polynomials (see Subsection |5. 1.1 [ l and has no noticeable growth 
trend. This plot was generated using M = 1, 000 equally spaced points on x G [— 1, 1]: this resolution is sufficient for 
demonstrating well conditioning, but the resulting ROQ integration rule would be of low accuracy. We have considered 
up to 100, 000 points, always obtaining behavior similar to that one of Fig. |l(b)| 

5.7.5 Integration of Runge's function 

Many general features of ROQ can be ascertained by integration of Runge's function 



r^ ?T< fa = 2taa-Hl) l 



-i (1 + 

which is a classic example of oscillatory errors due to high-order interpolation. In the following experiment, integra- 
tion of Runge's function is carried out with three different ROQ rules built from a basis consisting of m Legendre 
polynomials. In each case the Legendre polynomials are evaluated at Mi = 10, 000 equally spaced and M2 = 400 
Gauss-Legendre points, and the integration errors are computed as |2tan _1 (l) — 7rqqI- 



Two-step greedy algorithm for reduced order quadratures 



17 



First, integration of Runge's function is carried out using an m-point ROQ-trapezoidal rule which is built from an 
Mi -point extended trapezoidal rule and the m DEIM point selections. The resulting integration error is given by the 
dash-dot black line in Fig. [2] Notice that the ROQ-trapezoidal error curve flattens out around 1CF 9 . Beyond this value 
the 10, 000 point trapezoidal rule is unable to accurately integrate the Legendre polynomials of degree higher than ~ 20, 
thereby limiting the ROQ accuracy (see Theorem[T]l. 

Next we show how accuracy may be restored in this particular case. Notice that the factor cl> t V in Eq. \25) for the 
ROQ weights is precisely the numerical integration of m Legendre polynomials. Furthermore, we know that the first 
Legendre polynomial integrates to \f2 while all others integrate to zero. Hence, in this particular case we may integrate 
them "by hand" by setting u> T V = [\/2, 0, . . . , 0], while continuing to generate interpolation points by way of the DEIM 
algorithm. The resulting ROQ rule, denoted in the figure as ROQ-equidistant, is then used to integrate Runge's function 
with an error given by the solid blue line in Fig. [2] This highlights the importance of accurate integration of the basis 
functions which, in a general setting where the basis vectors are selected by a greedy algorithm, should be carried out 
with a good underlying quadrature rule. 

Lastly, an ROQ rule is constructed from a M2 = 400 point Gauss-Legendre rule (dashed red line of Fig. [2j and 
shows excellent agreement with the ROQ-equidistant case. Of particular noteworthiness, all three ROQ rules clearly 
show exponential convergence with the number of quadrature points and no apparent issues stemming from Runge's 
phenomena. Notice that the dashed red and solid blue lines are nearly indistinguishable, confirming the expectation that 
if the basis functions are sampled densely enough (a statement which certainly depends on m) and integrated accurately 
enough by the underlying quadrature rule, the resulting rules perform similarly. 



Integration error of Runge's function 

10 i ■ ■ . 




# of ROQ points (to) 

Fig. 2 Convergence of the integral J j (l + x 2 ) 1 dx to 2tan —1 (l) using three different ROQ rules whose generation (using Legendre 
polynomials as basis) is described in Sec. p. 1.3] Integration error from a ROQ-trapezoidal constructed with a M\ = 10, 000 point extended 
trapezoidal rule is given by the dash-dot black line. Due to the extended trapezoidal rule's inability to accurately integrate higher order Legendre 
polynomials this curve flattens out around 10 -9 . If we exactly integrate these basis functions by setting u T V = [\/2, 0, . . . , 0] in Eq. \25) 
while continuing to use the ROQ-trapezoidal's quadrature points the accuracy is restored as seen from the ROQ-equidistant error curve (solid 
blue). Integration error from an ROQ constructed with a M2 = 400 point Gauss-Legendre is given by the dashed red line. 



5.2 Gravitational waves 

The inspiral and merger of neutron stars (NSs), black holes (BHs) or mixed pairs, known generically as compact binary 
coalescences, is believed to be one of the main sources of gravitational waves to be detected by the upcoming generation 
of earth-based observatories 1 25, 26, 27, 28. 29, 30 1. A passing wave distorts the length between any two points separated 
by a non-zero spatial distance. These distortions are measured by gravitational wave detectors as an effective strain 
hfi{x), where x denotes either time or frequency. In this problem the parameter fi depends on quantities such as the mass 
and spin of the compact objects, or detector orientation. In principle is obtained by numerically solving Einstein's 
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equations, which are a set of parameterized quasilinear hyperbolic -elliptic equations 131,32 33, 34] . However, due to the 
cost of these simulations and the properties of the target system, simplified models are often used. 

Here we focus on testing our proposed ROQ on the post-Newtonian (PN) approximation 1351 . under which the 
gravitational waves in the leading order Stationary Phase Approximation (SPA) are given by 136, 3711381. 

h M M) = Ar 7/6 ■ exp (i j- J + JL ( w . g . / . M c y 5/3 ^j , (30) 

where / denotes the frequency, G Newton's gravitational constant [39], c the speed of light, and A an overall amplitude 
that depends on quantities such as the distance to the compact binary coalescence. We will refer to Eq. < |30[ ) as a gravita- 
tional waveform, two representative examples are shown in Fig. [3] The single parameter fj, in this model is the chirp mass 
M c , which in turn is a function of the compact objects' masses {mi}l =1 ,M c ■= (m 1 m 2 ) 3/5 {m 1 +m 2 )~ 1 / 5 1781. All 
quantities here use MKS units. 




frequency (/) frequency (/) 



Fig. 3 Gravitational waveforms Ha (left) and h B (right) correspond to the smallest (A = 2.611651689888372M ) and largest (B = 
26.11651689888372Mq) chirp masses of the training set 7k defined in (32) . Waveforms are complex valued, but for clarity we do not show 
the imaginary part of Ha- 

There are various reasons one might need to repeatedly compute inner products between waveforms. One setting 
where this occurs is in gravitational wave searches using matched filtering [40 ,41 ,42. 38 1. Given data s recorded by 
a detector, a matched filtering search is carried out by computing all possible inner products of the data s against 
waveforms drawn from a (usually large) catalog. It can be shown I43II44I that the optimal matched filtering strategy 
requires one to compute inner products with a weight given by the reciprocal of the detector's power spectral density 
S{f) = W' 1 . As a representative example we use S(f) from the initial Laser Interferometer Gravitational Wave 
Observatory (LIGO) [45], which can be modeled by the curve [46 1 

S(y) =9x 10- 46 [(4.49j/)- 56 +0.16jT 4 - 52 + 0.52 + 0.32 V] , V = j^-- (31) 

To ensure that no signals are missed and that the parameters are correctly estimated many inner products are needed for 
each segment of data [ 1 8 47 ,48 49 1. These significant computational costs motivate a need for faster ways of performing 
these integrals, often in real-time 150,51,521. 

In the following series of experiments we consider gravitational waves of the form given by Eq. I |30[ >, generated by 
inspiralling binary black hole (BBH) systems with mass components in the interval £ [3, 3O]A/0, where 1M© = 
1.98892 x W 30 Kg is the astrophysical unit of mass known as a solar mass. Therefore the parameter domain under 
consideration is V = [A, B], where A = 2.61 1651689888372M© and B = 26.11651689888372M©. The physical 
domain frequency interval is given by Q = [40, 366.3383434841933] Hz. Here the lower value is set by the Earth's 
seismic noise which decreases the detector's sensitivity. The upper bound is set, roughly speaking, by the limitations of 
the PN approximation which breaks down at frequencies corresponding to the Innermost Stable Circular Orbit (ISCO). 
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While this is a mass dependent frequency, for simplicity we choose the upper limit as the maximum over the mass range 
here considered. 

Our primary goal in the forthcoming sections is to describe the construction and performance of an ROQ rule for 
computing inner products between gravitational waveforms given by Eq. i |30| >. These steps are conveniently summarized 
in Algorithmic] The first step, an approximation for T « F m , can be carried out either by means of a direct greedy (Path 
#1) or two-step greedy (Path #2). Due to a significantly reduced offline cost we advocate the two-step greedy approach, 
which in turn requires an intermediate approximation J" « F n . Results for this intermediate approximation are con- 
sidered in Section |5.2.1| We contemporaneously remark on the performance of the DEIM algorithm when applied to 
single functions to facilitate a better understanding of this algorithm in a simpler setting. In Sections [5.2.2| and |5.2.3| we 
complete the second half of the two-step greedy approximation, identify the interpolation points and finally generate 
an ROQ rule. In this section we also answer the fundamental question: Do the direct and two-step greedy approaches 
give comparable results and is the former much more expensive than the latter? As we will see, we have strong numer- 
ical evidence that, as predicted by our estimates, the two-step greedy gives enormous computational savings without 
sacrificing the accuracy or compactness of the reduced basis. 



5.2.7 RB-greedy and DEIM for single functions 



Here we consider experiments focused on approximations of the set of functions T = {fiM c {f) '■ -M-c € V}; with 
hj^4 c as in Eq. ( |30] > and the range V = [A, B] for the chirp mass M c -serving here as the parameter fx- as described 
above. We find the DEIM points and numerically confirm the DEIM error bounds from Sec. |4.2.2| This subsection is 
also precursor to Sections [5.2.2| and |5.2.3| where ROQs are constructed for waveform inner products. 

Reduced Basis: We begin by building a reduced basis space F n approximating T by using the RB-greedy approach. 
Since the number of cycles is proportional to an inverse power of M c , we have found it advantageous to build the 
training space with a logarithmic spacing between the samples, thereby clustering more points at low values of Ale- 
More precisely, a training set of size K is here given by 

T K = M = 0,...,/<-l| , (32) 

and an associated training space of normalized waveforms 

Strain = {h Mc (f) I Mc € T K } . 

Through numerical experiments we have found that for this problem the number of basis for any given greedy error 
saturates with at most K = 3, 000 training space elements, with - for example - 178 RB elements {ei}}l\ need ed to 



achieve a toleranc^jof e 2 = 10~ 12 . Therefore, the results shown below use 3,000 training space samples. Fig. 4(a) 
shows the distribution of points selected by the greedy algorithm. They cluster at low M c , which corresponds to more 
lower-frequency oscillations in Eq. l |30| >. Such clustering is expected, since the number of cycles in a frequency range 
/ £ [fmin, fmax] is given to lowest post-Newtonian approximation order by (graphically seen from Fig [3} 

A/" cycles = A/" {fmin) ~ A/" {fmax) , 

where (see Eq. (4.23), and Eq. (5.247) from Ref. l44l ) 



M{f) = 1/(32tt 8 / 3 ) (gA1 c /c 3 ) ^ / 



5/3 



The truly interesting aspect is that the distribution of points is nearly optimal in a precise sense, as described in Sec- 
tion |3. 1 1 and is automatically given by the RB-greedy algorithm. 

The solid black lines labeled "projection error" in both plots of Figure [5] show the square of the greedy error 
0"n (Strain \ H) , defined in Eq. (12) , over the training space Strain. as a function of the number of RB elements. Af- 
ter a slow decay, the error decays with a very fast exponential falloff, a feature that we have found in this family of 
inspiral gravitational waveforms and generalizations thereof HI 311 X 91 - 



5 Our numerical experiments are carried out with double precision arithmetic. This translates into a double precision computation of the 
quantity \\h)j, — 'Pnh/iW^ found in step 3b of Algorithm jl](RB -Greedy Algorithm), and hence an accuracy of about 10 — 7 in the computation 
of it's square root. These observations motivate a choice for the greedy error tolerance to be e = 10 — 6 . Refs. 1531541 address this issue in 
greater detail and propose alternatives for improving error computations. 
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Fig. 4 The left figu re|(a)| shows the parameters M c selected by greedy Algorithm[l] The right figure ( 
by DEIM Algorithm|2jout of 20, 000 equidistant frequency sample points. We see from the histograms that both the chirp masses selected and 
the frequency interpolating points cluster at small values. This is intuitively expected, because smaller chirp masses correspond to a greater 
number of waveform cycles in Eq. (30) . The point here is that the RB-greedy and DEIM algorithms automatically select both the parameter 
and physical relevant points in a nearly optimal way, without "human intervention" or "selection by eye". 



Discrete Empirical Interpolation: Since the reduced basis set {ej}|Z.i generated above is application-based (as op- 
posed to, say, Jacobi polynomials), an appropriate set of interpolation points is in principle not known, and that is where 
the DEIM algorithm of Section [3^2| enters the game. We now generate a hierarchical set of DEIM points for the reduced 
basis built (these results are not used in the construction of ROQ for inner products), choosing the maximum number of 
interpolating points which equals the dimension of the RB space. That is, using n = 1, 2, 3, ... , 178 reduced basis ele- 
ments, we sequentially computed the equivalent number of DEIM points. We recall that the RB-greedy approach selects 
points in parameter space, in this case the chirp mass, while the DEIM generates interpolating points in physical space 
(here frequency), and that both methods are hierarchical. That is, a seed choice for the first parameter value defines the 
first RB element, and from it the first interpolating point can be computed. Next, the greedy method chooses a second 
parameter value and the second RB element. By applying the DEIM to this new set (of so far two basis elements), the 
second interpolating point can be computed. And so on. Equivalently (the output is exactly the same), one can generate 
the whole RB first, up to the desired representation error tolerance, and then generate all of the interpolation points. Put 
differently, if there are n RB elements, up to n DEIM points can be generated, and generating some n± < n DEIM only 
requires the first m RB elements. This discussion, though perhaps trivial, might be helpful to keep in mind when later 
discussing the results from Fig. [5] 

As input to the DEIM algorithm we must provide the RB functions sampled on some set of frequency points. Two 
cases are here considered: i) the basis vectors are sampled at 1, 701 Gauss-Legendre points, which was the integration 
rule used to buil d the reduced basis in the Subsection |5. 2. 1| ii) the basis vectors are sampled at 20,000 equidistant 
points Figure 4(b) depicts the frequency distribution of selected points for the equidistant sampling case, which 
is of particular practical importance when one seeks to downsample experimental data. The overall structure for the 
Gauss-Legendre points case was found to be essentially identical, and is therefore not shown. This, we emphasize once 
again, is important because ROQ allow downsampling data without loss of accuracy, while producing a fast converging 
quadrature rule, regardless of the sampling. 

Next, we randomly pick 10, 000 waveforms, not necessarily in the training space, and represent each of them as a 
DEIM interpolant (that is, using Eq. (|24[l). Both sampling at Gauss-Legendre and equidistant points are considered. For 
each waveform the DEIM error is calculated in the discrete L 2 norm as \\h^, — I n \h^\^. Fig. [5] compares the largest 
DEIM error over all 10, 000 waveforms (solid blue line) with the greedy error (solid black line). Notice that the black 
line lies strictly below the blue line; for waveforms in the training space this is guaranteed by the optimality of the 
L 2 orthogonal projection Vnh^. In turn, the DEIM interpolation error is bounded, again as expected, from above by 
the a-priori theoretical error estimates given in by Eq. ( |28| l (red line) and Eq. ( |29| l (purple line). In principle these two 
error estimates strictly hold only for waveforms in the training space, but our numerical results show that they evidently 



6 See the discussion in Section 4.3 related to the subtleties involved when sampling the RB functions at a set different from those used to 
compute the underlying quadratures to build the RB itself. 



Two-step greedy algorithm for reduced order quadratures 



21 



continue to hold for waveforms between elements of the training space when the latter is sufficiently dense. The red line 
provides a sharper bound on the error but it is also more expensive to calculate (cf. Remark [6JI. Finally, notice that, at 
least for elements in the training space, the asymptotic convergence rate of the DEIM is predicted to be bounded by the 
RB-greedy representation error (with the Lebesgue constant as a proportionality constant), see Corollary [2] . From the 
figure we can see that the similar asymptotic convergence rates for the DEIM and reduced basis representation continue 
to hold for waveforms not necessarily in the training space. 



max error for single waveforms (Gauss points) max error for single waveforms (equidistant points) 




# RB/DEIM (n) # RB/DEIM (n) 

Fig. 5 Maximum projection and interpolation errors, as well as the predicted error bounds, for single waveforms as a function of the number of 
reduced basis (equal to the number of DEIM points) used. The left plot refers to ROQ built from frequency points sampled at Gauss-Legendre 
nodes, while the right one shows ROQ built from equidistant points, whence the error bounds given by Eq. \29\ (error bound 1, magenta) and 
Eq. {28} (error bound 2, red) are identical up to numerical accuracy (cf. remark|6}. In both figures "projection error" plots the square of the 
greedy error <T n (-F tra j n ; H) and "interpolation error" plots a maximum error ||/i/n c — T-n [Xa/Ic] lid ta ken over 10, 000 randomly drawn 
values of Ai c not necessarily in the training space. For both cases the maximum interpolation error is nearly identical, as expected. 



5.2.2 Direct and two-step greedy: results and comparison 

In the previous subsection we constructed an approximation space F n £s T . It was found that 178 basis elements are 
needed to represent any member of a 3000 member training set given by ( |32[l with an accuracy better than e 2 = 10~ 12 . 
We continue following Algorithm Q with an aim towards approximating T . Results for both the two-step and direct 
greedy algorithms are given, followed by a short discussion. 



Two-step greedy results: As discussed in Sec. 3.3 a second RB-greedy is used to build a space F m which approximates 



T. Following the prescription outlined there, we begin by building a training set Tn = {(Hi, fJ-j)}" ' j=\ and a training 
space F n 2 of associated normalized products {h*^ i h^ lj /\\h* lli h^ j ||d} " 7= i, where {/ii}™ =1 are the greedy points identified 
upon the first greedy approximation T « F n carried out in Sec. |5.2.1| Clearly our definition of F n i is not the only 
choice and, in particular, one may build a training space from normalized products of the orthgonalized basis {ej}^| 
(cf. Remark[3]l. 

Through numerical experiments we have found that 339 reduced basis elements are needed to achieve a tolerance 
of e 2 = 10~ 12 for approximation of F n 2. Fig.[6](left) shows the distribution of points selected by the greedy algorithm 
(the two-step greedy results are denoted by black circles). As expected they cluster towards lower values of M c (see 
Sec. |5.2.l) . The solid blue line labeled "projection error, Two-step" in Fig. [6] (right) plots the square of the greedy error 
a m {F n 2\'H), defined in Eq. l |12| >, over the training space of F n 2, as a function of the number of reduced basis elements. 
Furthermore, through Monte Carlo sampling of the continuum we find any waveform to be accurately represented (see 
"interpolation error" of Fig.[7j right) by these same 339 basis elements. 

Direct greedy results: One may consider direct approximation of T through a single RB-greedy (Path # 1 in Al- 
gorithm [4}. For most problems this will be prohibitively expensive even as an offline computation. Nevertheless, we 
provide details on it here mainly for comparison with the two-step approach (Path # 2 in Algorithm ffl. We take our 
training set to be a Cartesian product Tx = {(fJ-i, ^i)}^j=x °f the K = 3000 element training set given by \32) and a 
training space F K 2 to be the associated normalized products h^.h^j /\\hp i h IJ , j ||d- To achieve an approximation tolerance 
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Direct and two-step greedy points for products of waveforms 
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Fig. 6 Selected chirp mass J\4 C values (left) and square of the greedy error (right) for both the direct and two-step greedy approaches. 
Each approach seeks to compress a training space which consists of normalized products of waveforms. However, these training spaces are 
constructed in very different ways. The two-step training space is built from a Cartesian product of greedy points identified while building an 
RB space for waveforms (as opposed to their products) and thus has 178 2 members. A direct greedy uses a training space from the Cartesian 
product of the 1~k defined in Eq. \i2) and thus has 3000 2 members. Remarkably, despite these differences, we find that the exact number 
(339) of basis vectors are needed to achieve an error tolerance of e 2 = 10~ 12 for both the two-step and direct greedy. The left figure shows 
the point distribution is slightly different with more clustering at lower masses for the two-step process (black circles) as compared with the 
direct one (red squares). 



of e 2 = KT 12 we have found that 339 RB elements were needed. Fig. [6] (left) shows the distribution of points selected 
by the greedy algorithm (the direct greedy results are denoted by red squares). The dashed black line labeled "projection 
error, Direct" in Figure[6](right) shows the square of the greedy error <j m (J^2 ; H), defined in Eq. \Y1) , over the training 
space T K i , as a function of the number of RB elements. 

Discussion and comparison of two-step and direct greedy: We first notice that when building F n i from products of 
waveforms (as opposed to products of orthonormal basis vectors (cf. Remark [3}) one has F n i C F K i. In light of this 
observation one may view the two-step greedy as a smarter choice of sampling T . Furthermore, to better assist with 
our comparison, in the numerical experiments carried out above we have initialized both the two-step and direct greedy 
algorithms with the same seed. 

Remarkably, despite the differences in the two approaches, we find the greedy error curves are nearly identical and 
in both cases exactly 339 basis vectors are needed to achieve an error tolerance of e 2 = 1CP 12 (the greedy errors are 
defined with respect to different training spaces, however). In both cases after a slowly decaying region we observe very 
fast exponential convergence of the form Ce~ c °™ . For the two-step a m (F n 2 ; H) can be fitted by 

C = 4.19 x 1(T 3 , c = 0.981, and a = 0.923, 

while for the direct af^i^K 2 \ H) can be fitted by 

6 = 3.98 x 10~ 3 ,So = 1-07 and a = 0.875. 

The number m of basis found to be needed so that F m represents T within machine precision, m = 339, is 
remarkably close to twice that one needed so that F n represents T with the same accuracy, n = 178. If we were 
dealing with polynomials, it would be exactly m = In. From a practical perspective, with the two-step greedy approach 
we remove significant redundancy amongst elements of F n i where the dimensionality of the space to represent products 
of waveforms is compressed from n 2 to ~ 2n, with a compression ratio of about 90 for this problem. 

Notice the savings in the offline stage when building the reduced basis space F m using our two-step greedy approach, 
compared to a direct one. For the problem here considered, which is not particularly large in terms of the number of 
physical parameters (typically an 8 dimensional parameter space for a faithful description of compact binary coales- 
cences [ 19|), we needed 3, 000 training space points to build F n . Using a direct approach to build F m one needs 9 x 10 6 
training space points, compared to the 178 2 needed in the two-step greedy, with offline savings when building F m of 
~ 284. At the end of the day, both basis are able to accurately represent any waveform product and hence perfectly well 
suited for an ROQ construction. Clearly the two-step is preferable when considering these costs; we refer to Remark[7] 
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Remark 8 The savings for larger problems, for example when using a lower cutoff frequency of 10Hz, as estimated for 
the upcoming generation of earth-based detectors, or including spins in the modeling of each compact objects would be 
considerably larger (even in the absence of precession). For such cases we have typically used ~ 10 6 elements in the 
training space in order to build a high accuracy RB, with less than ~ 2, 000 RB elements needed to represent F n 11711181 
[T9l . For those cases a direct RB-greedy construction of F m would in principle require ~ 10 12 training space elements 
- our two-step greedy would then save a cost of ^ 10 5 . 



5.2.3 DEIM and ROQ for overlap/inner-product integration 

In the previous subsection we constructed the approximation space F m ~ T using a direct and two-step greedy algo- 
rithms. While both approaches identify an accurate and compact reduce basis set {ei\YLi approximating T, the two-step 
has a much smaller computational cost (see Remark|7]i. We now continue with Algorithm |4). The results shown are for 
a reduced basis generated from the two-step greedy. 

Discrete Empirical Interpolation: At this point we have a reduced basis set {e^lfl^ approximating T , which for this 
problem can be viewed as continuous functions of frequency / (they are known in closed form for any values of /). 
We now generate the corresponding set of interpolation points using the DEIM algorithm of Section |3.2| Results for 
Gauss-Legendre sampling (case 1) are shown in Figure [7J equidistant sampling is qualitatively similar. A distribution 
of selected DEIM points is depicted in the leftmost plot. Notice that the overall structure is very similar to the single 
waveform case shown in Figure [4(b)] 

For 20, 000 randomly sampled normalized waveform products g g T, not necessarily in the training space, the 
DEIM interpolant is evaluated using Eq. \24\ . The DEIM interpolation error is calculated in the L 2 norm as \\g—X m [g}\\%. 
Fig. [7] (right) compares the largest DEIM interpolation error over all 20, 000 products (solid blue line) with the square 
of the greedy error a m (F n 2 ; H) (solid black line). Notice that the DEIM interpolation error is bounded from above 
by the a-priori error estimates Eq. (28) (red line) and Eq. l |29| l (purple line). These two error estimates strictly hold for 
waveforms in the training space, and evidently continue to hold for waveforms outside of the training space when the 
latter is sufficiently dense. The red line provides a sharper bound on the error, but its also more expensive to compute 
(cf. Remark|6]l. 
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Fig. 7 The left figure shows the distribution of empirical interpolation points selected by the DEIM algorithm with the basis vectors sampled 
at 1, 701 Gauss-Legendre points. The right figure shows maximum projection and interpolation error (and error bounds) for products of 
waveforms as a function of the number of reduced basis used; notice that the profile is similar to that one of Fig. [5] On the right fi gure 
"projection error" plots the square of the greedy error a m (F n 2 ; H) and "interpolation error" plots a maximum error ||<? — I ra [g] ||^ taken 
over 20, 000 randomly drawn normalized products g £ T not necessarily in the training space. A- priori interpolation error bounds given by 
Eq. (28) (error bound 2) and Eq. (29) (error bound 1) are also shown. DEIM results for equidistant sampling are qualitatively similar to the 
Gauss-Legendre case. 



Reduced Order Quadratures: Having assembled a set of basis vectors and empirical interpolation points for products, 
we compute the reduced order quadrature weights ujf' ^ with Eq. l |25| > to complete the ROQ rule. Like the previous 
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experiments of this section, we randomly sample 20, 000 inner products (with normalized integrands) to compute, 
and monitor the maximum error from this computation. Figure [8] compares accuracy versus computational degrees of 
freedom (number of quadrature nodes) for a variety of integration schemes. The black and blue lines denote trapezoidal 
and Gauss-Legendre quadratures, respectively. Two cases are considered within reduced order quadratures. The ROQ 
leading to the red line stems from a reduced basis space {e^}^ with Gauss-Legendre nodes and underlying quadrature 
rule. The magenta one, in turn, uses equally spaced samples and a trapezoidal underlying rule. From Figure[8] one can 
see that ROQ have a factor of ~ 2 of savings when compared to Gauss-Legendre points for a maximum error below 
10~ 2 — 10 _1 , while the savings compared to the extended trapezoidal rule at high resolutions are greater than 50. 
This latter comparison is the one of interest for many data analysis applications where samples are often given on an 
equidistant grid spacing (downsampling). 



max integration error for different quadrature rules 
10 1 . 




# of quadrature points 



Fig. 8 We randomly draw 20, 000 normalized pair products g 6 T and for each compute a very accurate inner product which we take to 
be its exact value I c . Next, for each product we compute an inner product using a i) Gauss-Legendre quadrature (blue line), ii) trapezoidal 
rule (black line), iii) ROQ built from Gauss-Legendre quadratures (red line), and iv) ROQ built from the trapezoidal rule (pink line). In each 
of the four cases we monitor the maximum errors \I C — Id\ (for the Gauss-Legendre and trapezoidal rules) and \I C — /roqI (f° r me ROQ 
accelerated Gauss-Legendre and trapezoidal rules). Once the underlying product is well resolved by the empirical interpolant we see very fast 
exponential convergence of /roq ~~ ► (compare with "interpolation error" plotted in the right panel of Figurej7j. Notice that both ROQ 
rules perform similarly, which is to be expected whenever the underlying integration scheme is able to accurately integrate the reduced basis 
(cf. the discussion following Algorithm[4]and Fig. [5). After an error of about 10 _1 both ROQ rules outperform their discrete counterparts and 
in some cases significantly so. 



6 Final remarks 

Numerical integration is a well studied topic (see Refs. [55 24| for excellent introductions). One may wish to identify 
when Reduced Order Quadratures (ROQ) are likely to be a competitive option over alternative, more standard integration 
methods. First and foremost, the set of functions to be integrated should be well approximated by a small set of basis 
vectors. Although we have here focused on bases obtained through the Reduced Basis-greedy approach, ROQ apply un- 
changed to any other basis sets; for example, those obtained through a Proper Orthogonal/Singular Value decomposition. 
Second, since the cost of building the reduced basis either by a RB-greedy or POD/S VD is potentially large, it should be 
done offline. For larger problems, however, a direct reduced order modeling approach might be unfeasible, even when 
building the basis is an offline and parallelizable calculation. To overcome this we proposed a two-step greedy targeted 
towards approximation of products of functions (such as those appearing in weighted inner products) which, for the 
problem considered here, accelerated offline ROQ computations by two orders of magnitude. Finally, and crucially, one 
should determine which online costs to reduce. Benefits of ROQ are greatest when multiple fast evaluations of weighted 
inner products (or perhaps simply weighted integrals) are required between parametrized functions. Are the functional 
evaluations themselves costly? Quadrature rules which are designed for high-order integration of polynomials might 
lead to significantly more functional evaluations compared to an ROQ rule. Can one sample the function at arbitrary 
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points or is the data sampling specified? In our numerical experiments with gravitational waves the savings were mod- 
erate (a factor of ~ 2) when we were able to dictate the sampling location whereas they were much greater (a factor of 
~ 50) when the function was sampled at equally spaced points and integrated with an extended trapezoidal method. This 
suggests the possibility of using ROQ as an efficient downsampling criteria when doing matched filtering, for example 
(see (321). 

There are many potential applications of ROQ which deserve further consideration. As pointed out in Ref. |4), 
empirical interpolation is remarkably versatile in its handling of multi-dimensional problems on irregular domains, and 
a natural application of ROQ would be to such problems. Discontinuous functions (in physical space) with smooth 
variation with respect to parameters might admit a reduced basis approximating to the full space with fast convergence, 
and if so it might be possible to generate fast converging ROQ for a particular family of discontinuous functions. 
Furthermore, as pointed out in Sec. [4] the hierarchical construction of ROQ node locations allows for very natural 
application-specific nested quadrature rules of arbitrary order and depth. 
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A Notation 

List of symbols: 

- T: parameterized space of target functions. 

- /t: parameter (in general, a multi-dimensional one). 

- fii , fij : numerical values of the parameter fx. 

- T: parameterized space of product of target functions. 

- /Ifci in sets arising from products, such as T 2 , ~fik(i,j) = (Mi> Mj) * s a re-indexing of a two dimensional array {(fJ-i, /Uj)}"j = i into a one 

dimensional array {Jlk}2 = i> where k is a one-to-one function which takes two integers i and j and returns a unique integer k(i,j). 

- K: number of training space elements sampling T . 

- K 2 : number of training space elements in principle needed to directly sample T . 

- n 2 : number of training space elements used in our two-step greedy approach to sample T. In general n 2 -C K 2 . 

- M: number of samples or quadrature points where functions to be integrated are evaluated. 

- n,m: number of DEIM points (and reduced basis) to approximate elements in T and T receptively. 

- Discrete L°° norm of a g evaluated at M quadrature points is = maxKj<jvf 

- The spectral 2-norm of a matrix Q 6 C Mx ° is defined by 

|||Q||| 2 := max = (A max (QtQ)) ^ , 

u^o ||u|| d V J 

with t denoting conjugate transposition. The matrix oo-norm is given by 

n 

IIIQIIIco = max El? M |, 

- - f=l 

respectively, where q^i denotes the (k, £) entry of Q. 



B FLOP Count 

In this appendix we provide the asymptotic FLOP count for a variety of algorithms discussed through the paper. These numbers are used for 
comparing the relative costs of different algorithms. 



B.l DEIM FLOP Count 

In Algorifhm|2|as P T U is a lower triangular matrix therefore (after m iterations) Step (4) costs 0(m 3 ), Step (5) costs O(Mrn) subtractions 
and 0(MmrJmatnx vector multiplications, where latter is the dominant one. The improvement in our implementation of Algorithm [5] as 
compared to \2\ is that the our implementation scales as C(m 3 ) as compared to 0(m 4 ) in [2 1. Therefore the total DEIM cost is: 

0(Mm 2 + m 3 ) . 
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B.2 ROQ FLOP Count 



The dominant cost (assuming M > m) of computing reduced order quadrature weights ^tt> HC> QJ = lj t V ^P T Vj in (25) is 

0(Mm 2 ). Here we used the fact that the operation P T V is equivalent to selecting m rows corresponding to DEIM points {pt}7L\ ■ 
Then the overall cost to compute ROQ using one-step (Algorithm[T][Pi3;/! #Y\ with modified Gram-Schmidt in greedy is: 

0{K 2 Miri + Mm 2 ) 

and two-step (Algorithm[T][ftoft #2] is: 

O (n 2 Mm + KMn + Mm 2 ) . 

Here m, in denote the number of reduced basis used in approximation of T via two-step and direct approach respectively and n is the number 
of reduced basis used to approximate T . 



C DEIM and LU Decomposition with Partial Pivoting 

The original DEIM described in \1\ has as Step 7 

U=[U f%], i = 2,...n. 

Since the columns of U are linearly independent, P T U is always invertible, but in this form the matrix P T U can be dense. Next we show 
that with a slight modification to the original DEIM, we get Algorifhm[2]and at every iteration i = 2, . . . , n with i-j := r, 

u := [u n] , 

gives the same result. One of the advantages of looking at DEIM in this format is that the matrix P T U is lower triangular. Next we prove that 
we get the same result using both formats. 

Proposition 1 In $2%, Step 7 of Algorithm^was presented as U = [U e, ■], i = 2, . . . , n, we can replace it by 



U = [U n], i = 2, 
where := r, at every iteration, and r is as given in Step 5 of Algorithm^ 
Proof Step 4 of Algorithm[2] at i = n, gives the coefficient matrix 

/ ei(pi) e 2 (pi) 
ei(p2) e2(p2) 
p^U= e i(P3) e 2 (p 3 ) 



.,71, 



e„_i(pi) \ 
en-l(P2) 
e n -i(P3) 



^ei(p„_i) e 2 (p n _i) ■■■ e n _i(p n _i) J 
Next we write the row reduced echelon form using forward Gaussian elimination for (P T U) T , where the first step implies 



( ei(pi) 
ei(p 2 ) r 2 (p2) 
ei(p3) r 2 (p 3 ) 



\ 



ei(Pn-i) r 2 (p„_i) 
The final row reduced echelon form for (P T U) T is 



en-l(Ps) 
e„-l(Pri-l) 



( e x (pi) 
ei(p2) r2(p 2 ) 
eife) r 2 (p 3 ) 

V ei(p n -l) r2(pu-l) 



r n (Pn-l) j 



(33) 



Hence the proposition. 



We note that advantage of using r in Step 7 of Algorithm^ is that then in Step 4, (P T U) is a lower triangular matrix, therefore the 
system can be solved for c with 0(n 2 ) operations, as compared to using ej, which gives a dense matrix (P T U), solving for c requires 
0(n 3 ) operations. 



Two-step greedy algorithm for reduced order quadratures 



27 



Remark 9 Algorithm[2]has a flavor of Gauss elimination with row pivoting. We showed the elimination part in Proposition[T] Next we will 
show that the DEIM points {pi , . . . , p n } are in fact the pivots, similar to Gauss elimination. Our goal is to arrive at \33\ row echelon form of 
(P T U) T with with partial pivoting. Consider the matrix 



( ei(pi) e 2 (pi) 

ei(p 2 ) e 2 (p2) 

ei(ps) e 2 (p 3 ) 

^ei(p„_i) e2(pn-l) 



e n -l(pi) \ 
e n -i(P2) 
Bn-l(ps) 



L(pn-l) J 



Let the location of the first pivot, i.e., the maximum of the absolute value of the first column be pi (this is the same as Step 1 in Algorithm[2j. 
Apply the first step of forward elimination as in the proof of Proposition[T| Next compute the second pivot p 2 , which is the maximum of the 
absolute value of the so-obtained second column. Following in this manner we arrive at the matrix in J33). 
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